-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dtype-i128
feature flag and Int128Type
#6374
Conversation
Decimal arithmetics require manipulating the DataType when doing some operations, i.e.: changing precision/scale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I think we should not think of decimal
yet and just implement i128
as its own type.
The future decimal type will be a wrapper around this and that type will have to figure out how to deal with arithmetic, it is not a concern of the physical i128
type. So the arithmetic can continue as is.
@@ -82,6 +82,8 @@ impl_polars_datatype!(Int8Type, Int8, i8); | |||
impl_polars_datatype!(Int16Type, Int16, i16); | |||
impl_polars_datatype!(Int32Type, Int32, i32); | |||
impl_polars_datatype!(Int64Type, Int64, i64); | |||
#[cfg(feature = "dtype-i128")] | |||
impl_polars_datatype!(Int128Type, Unknown, i128); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unknown
should be Int128Type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably, you mean Int128
(which would be DataType::Int128
)?
If so, I don't think that will work because I think DataType
will eventually be something like DataType::Decimal(precision, scale)
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but then I think we should make the DataType
something like DataType::Decimal(Option<precision, scale>)
. Then we can fill in the blanks later.
@ritchie46 Thanks for the comments. The issue is that So when we introduce Unless what you're suggesting is that we don't use |
@ritchie46 Any additional feedback or guidance you could provide? I might have some more time over the next few days to look at this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make clippy happy, then we can go forward.
@@ -82,6 +82,8 @@ impl_polars_datatype!(Int8Type, Int8, i8); | |||
impl_polars_datatype!(Int16Type, Int16, i16); | |||
impl_polars_datatype!(Int32Type, Int32, i32); | |||
impl_polars_datatype!(Int64Type, Int64, i64); | |||
#[cfg(feature = "dtype-i128")] | |||
impl_polars_datatype!(Int128Type, Unknown, i128); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but then I think we should make the DataType
something like DataType::Decimal(Option<precision, scale>)
. Then we can fill in the blanks later.
Thanks @plaflamme. The next step is adding this |
If we can get this merged, that would be preferable, yeah. Though I'd be worried with the |
Nope, it is an incremental effort. |
Thanks @plaflamme. Next up a |
@ritchie46 great, thanks for merging this. Do you have any pointers to get me started on |
Yes.. And you need to make newtype that is |
@ritchie46 am I on the right track with this? |
Yes! Most methods can be simply dispatched to ChunkedArray<128> from the Series implementation. If you open a PR we could track this. |
@ritchie46 here's a PR: #6374 |
@plaflamme Thanks for your work on this! I'm very exicted in seeing the int128 support becoming full-fledged (and yet another thing to distance polars from pandas...). @ritchie46 wonder what's the approximate roadmap to getting int128/decimals supported all the way through to the Python layer, what are the next steps from here? (I could try to contribute myself if the tasks are reasonably narrow-scoped and if it helps to speed it up) |
@aldanor I've started some work in this branch. I don't really know what I'm doing, I'm mostly following breadcrumbs from datetime and other types that are similar to @ritchie46 any guidance would be appreciated |
@plaflamme Maybe open a PR in your own repo and tag me? Skimming through, I might have a few random suggestions - would be easier to comment in the PR |
Actually, one (perhaps weird) question to discuss before we hack too deep: could we just get rid of It won't change any computational logic or affect any casting as it's used solely for validation. Can't we just always use max-precision, i.e. 38, internally? Off top of my head, the only thing it can visibly affect is formatting, i.e. 123.45 with (5, 2) will get printed as "123.45" whereas 123.45 with (7, 3) will get printed as 0123.450, IIRC. Well... that, plus validation errors that may pop up during runtime where your value doesn't fit declared precision – do we want that? Another related question is whether scale (and precision if it's used) should be an |
This adds a new datatype:
Int128Type
which is the primitive type used by arrow to representDecimal
types. This new type is behind adtype-i128
feature flag.Introducing this type requires abstracting over the arithmetic used for
PrimitiveArray
because decimal arithmetic must keep track of changes toprecision
/scale
. This PR adds a new traitArrayArithmetics
which delegates tobasic
for all primitive types excepti128
anddecimal
fori128
. There's probably a better approach, but this is the extent of what my Rust-foo is capable of delivering.There are a few problems:
div_scalar
inbasic
anddecimal
do not equivalent signatures: unclear how we can abstract over thisrem
andrem_scalar
are not implemented indecimal
(this doesn't seem like a blocker)This is obviously just a draft to get some feedback.